Topic Modeling on Podcast Short-Text Metadata

نویسندگان

چکیده

Podcasts have emerged as a massively consumed online content, notably due to wider accessibility of production means and scaled distribution through large streaming platforms. Categorization systems information access technologies typically use topics the primary way organize or navigate podcast collections. However, annotating podcasts with is still quite problematic because assigned editorial genres are broad, heterogeneous misleading, data challenges (e.g. short metadata text, noisy transcripts). Here, we assess feasibility discover relevant from metadata, titles descriptions, using topic modeling techniques for text. We also propose new strategy leverage named entities (NEs), often present in Non-negative Matrix Factorization (NMF) framework. Our experiments on two existing datasets Spotify iTunes Deezer, dataset an service providing catalog podcasts, show that our proposed document representation, NEiCE, leads improved coherence over baselines. release code experimental reproducibility results ( https://github.com/deezer/podcast-topic-modeling ).

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

Short and Sparse Text Topic Modeling via Self-Aggregation

The overwhelming amount of short text data on social media and elsewhere has posed great challenges to topic modeling due to the sparsity problem. Most existing attempts to alleviate this problem resort to heuristic strategies to aggregate short texts into pseudo-documents before the application of standard topic modeling. Although such strategies cannot be well generalized to more general genr...

متن کامل

Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases

As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand,...

متن کامل

Topic Models and Metadata for Visualizing Text Corpora

Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics learned from an unsupervised topic model in a faceted browsing experience. The user can manage topi...

متن کامل

Modeling Topic Dependencies in Hierarchical Text Categorization

In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally, to better investigate the role of category relationships, we consider two interesting cases: (i) traditional schemes in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-99736-6_32